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Abstract: Ptychography promises diffraction limited resolu- 
tion without the need for high resolution lenses. To achieve high 
resolution one has to solve the phase problem for many partially 
overlapping frames. Here we review some of the existing methods 
for solving ptychographic phase retrieval problem from a numerical 
analysis point of view, and propose alternative methods based on 
numerical optimization. 
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1. Introduction 

An emerging imaging technique in X-ray science is to use a localized moving probe 
to collect multiple diffraction measurements of an unknown object IB 121 [3 S |5j |6l 
|7l m. This technique is called "ptychography". In a ptychography experiment, one 
collects a sequence of diffraction images of dimension mxm. Each image frame jxlr') 
represents the magnitude of the Fourier transform of a{r)\j/{r -\-x), where a(r) is a 
localized illumination (window) function or a probe, \j/{r) is the unknown object of 
interest, and x is a translational vector. We can express as 

3;x(r') = |^{a(r)Vi^(r + x)}|, (1) 

where ^{/} denotes the Fourier transform of / with respect to r. 

In order to reconstruct the unknown object, we must retrieve the phases of the meas- 
ured images. A number of methods have been proposed to recover \j/{r) from ptycho- 
graphic measurements 3'x(r') IH IH HI [71 [H. The connection among these methods is 
not entirely clear from the existing literature. Furthermore, little detail is provided on 
the convergence rate or computational efficiency of these methods. 
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In this paper, we review some of the existing methods for solving ptychographic 
phase retrieval problem from a numerical analysis point of view, and propose to solve 
the problem by alternative methods that are standard in the numerical optimization 
community. In particular, we formulate the ptychographic phase retrieval problem as 
an unconstrained nonlinear minimization problem in section |2j and compare the con- 
vergence of several well known iterative methods for solving this type of problem in 
section [6] We discuss computational details such as line search and preconditioning 
that are important for achieving optimal performance in these methods in section [2] 
We also describe the connection between optimization based algorithms and projection 
algorithms that are often discussed in the phase retrieval literature in section [4] 

We point out that ptychographic minimization problem is not globally convex, which 
means that iterative methods can be trapped at a local minimizer if a poor starting 
guess is chosen. We show by a numerical example that one way to escape from a local 
minimizer is to switch to a different objective function in section |6] 

We observed that the convergence of the optimization based iterative algorithms used 
to perform ptychographic phase retrieval is accelerated when the amount of overlap 
between two adjacent image frames increases. We provide an intuitive explaination on 
why the amount of overlap between adjacent frames affects the convergence of iterative 
optimization algorithms in section [6] 

An alternative approach for performing ptychographic phase retrieval is a method 
known as Wigner deconvolution. We review this approach in section [5] and point out its 
connection to iterative algorithms and its limitations. 

We use standard linear algebra notation whenever possible to describe various quan- 
tities evaluated in the iterative algorithms we present. To simplify notation we use a/b 
to denote an element-wise division between two vectors a and b. Similarly, we use a ■ b 
to denote an element-wise multiplication of a and b. We also use and a^/^ occasion- 
ally to denote the element- wise square and square root of a respectively. The conjugate 
of a complex variable a is denoted by a. The real part of a is denoted by Re (a). The 
conjugate transpose of a matrix (or a vector) A is denoted by A*. The \x\ symbol is re- 
served for the magnitude (or absolute value) of x. The Euclidean norm of x is denoted 
by \\x\\ = \fx*x. We use Diag (x) to represent a diagonal matrix with the vector x on its 
diagonal. 

2. Ptychographic reconstruction formulated as an optimization problem 

The problem we would like to solve is to recover from a number of intensity measure- 
ments represented by ([T]). For a finite set of translational vectors x,, we will denote each 
measurement by 

hi = \FQi^\, i=l,2,..;k, 

where \j/ is the sampled unknown object that contains n pixels, bi is a sampled measure- 
ment that contains m pixels, F is the matrix representation of a discrete Fourier trans- 
form, and Qi is an m x « "illumination matrix" that extracts a frame containing m pixels 
out of an image containing n pixels. Each row of Qi contains at most one nonzero ele- 
ment. The nonzero values in Qi are determined by the illumination function a{r). 
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Fig. 1. An unknown object of interest ^r, and the measured amplitudes Zx related 
by matrix operations 

Given a set of measurements, b\, b2, ■■, bk, we may attempt to recover by solving 
the least squares problem 

1 

min-£|||z;|-^,-f, (2) 

where z, = FQiY, and the factor of 1 /2 is included here merely for convenience. 
An alternative objective function we may use to recover \j/ is 

e=^^i\M'-bn\ (3) 
^ 1=1 

where and bj denote vectors obtained from squaring each component of and 
bi respectively. The advantage of using Q is that it is more differentiable, hence more 
amenable to analysis. In practice, we found the objective function in ([2]) to be a better 
choice in terms of computational efficiency in most cases. 

To obtain the minimizers of Q or Q using numerical optimization techniques, we 
often need to evaluate the gradient and possibly the Hessian of these objective func- 
tions. Because both ([2]) and (|3]) are real-valued functions of a (potentially) complex 
vector i/A, derivative calculations must be done with care. One can either take the deriva- 
tive of (|2]) and Q with respect to the real and imaginary parts of Y independently or 
follow the CM-calculus formalism established in iflOl [TTl by treating y and \j/ as two 
independent variables. The latter approach is what we use throughout this paper. 
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2.1. Gradient 



If we let r, 



-bj, and define 



we can rewrite ^ as £{^f) = r^r/2. Let the matrix = 5r,/5i/A be the Jacobian of r/ 
with respect to i//. It follows from the chain rule that the gradient of e in vector form is 



V£(va) 



de dr\ " 
dr dY J 



7* 



(4) 



where 



J = 



\h ) 

Note that we may rewrite as Diag(z,)*z,-, where Diag(A:) denotes a diagonal 
matrix that contains the vector x on its diagonal and z, = FQjY- Using this observation, 
we can show that 



Ji 



dDiag (zi) Zi dzi , dDiag{zi)zi dz 



+ ■ 



dxf/ dzi dY dzi dxj/ 

It follows from (HI) and the above expression that 



: Diag{zi)FQt = DiagiFQiYTFQi. 

(5) 



Ve = l^Q*F*Dmg{zi)[\zi\'-bj]. 



(6) 



!=1 



The gradient of the objective function in which we will denote by p{y)^ is 
slightly more complicated. By rewriting \zi\ as (Izip)^/^, with the understanding that 
the square root is taken component-wise, and by using the chain rule and replacing 
d\zi\^/dY with the expression given in (pjl, we obtain 



^ d\zi\ d{\z,\^y/^ d\zi\^ 1 



if does not contain any zero element for all / = 1,2, ...,m. 
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Consequently, we may now express Vp(vA) as 



Vp{Y)=J*r 



(=1 



£e*f*Diag 



' (=1 



{\Zi\ 



Zi - Diag 
F!2,i/^-Diag 



!2*aT-e*^*Diag 



\Zi 



bi 



(7) 



Recall that Zi = FQiY- Thus, the expression Diag(F2,i/A)Diag(|z,|)^'Z7,- simply repre- 
sents projecting y onto the Fourier magnitude constraint imposed by the data Z?,. Note 
that the expression given above for the gradient of p(v^) is only valid when does 
not contain any zero for all / = 1,2, ...,m. If contains a zero component for some /, 
and if the corresponding component in bi is nonzero, Vp is not well defined, i.e., Vi/a 
has singularities at y/'s where FQiY contains a zero element for some /. 

Note that both (|6]) and ^ remain real when y is real and when bj is obtained from a 
discrete Fourier transform of a real image (so that conjugate symmetry is preserved in 
Biag{zi/\zi\)bi.) 

The directional derivatives of e and p along a direction ^ are defined by 

k 



de 



de 



1=1 



{\z,\^-bjfDiag{ziTFQi^ 



and 



3^0 + 3^0 = y Re 



rQlQi¥-rQlF*Diag 



(8) 



(9) 



respectively 



2.2. Hessian 

The Hessian of £(i/a) and p(v^) provides information on the convexity of these objec- 
tive functions. A globally convex function has a unique minimizer. Such a minimizer 
can be obtained by standard optimization techniques that we will describe in the next 
section. If the objective function is not convex everywhere, a standard optimization al- 
gorithm may produce a local minimizer that is not the true solution to ptychographic 
reconstruction problem. 

Again, because both £(v^) and p{\l/) are real valued functions of a potentially com- 
plex vector their Hessians are defined as 



H" 



HI 



ZJO 



TJO TJO 
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where 



H 



WW 



and o is either £ or p. 

It is not difficuh to show that 



= l!2;F*Diag(2|z;p-Z.2)Fft, 
^Iv^ = Y.Q^F^T>i?ig{2\zi\^-b})FQi, 

i 

tiw = Ie*F*Diag(z,)'fa, 



(10) 

(11) 
(12) 
(13) 



If we let tji = \tji\e'^i<, C,ji = and jSy,- be the 7th component of U = FQi^, 

Zi = FQiY and bj respectively, then the curvature Te(i//, 0) at y along any direction <p 
can be calculated as follows 

- Vit* fT\( Diag(2|z,|2-Z.2) Diag(z,)' \ ( \ 

'\ ^mizif Diag(2|z,|2-Z,2) 
= £2f;Diag (2|z,-|2 - bi) ti + 2Re[ff Diag [zifu] 

i 

= £2f;Diag - b'i) r,- + 2 (^f/Diag (|z,-|)'f,- + Re[ff Diag [zifh] 



k n 



2lIk;vl'(M'-i8^-) + (M'M^+Re 



i=lj=l 

k n 



(tjiZji) 



l5})+2\tji\\ji\^cos^{^ji-eji). (14) 



i=ij=i 

At the minimizer of £(i//^), = bi. So the first term of ( 14 1 is zero. Because the second 
term of (14i is nonnegative, T > 0, i.e., e is convex at the solution. Moreover, the 
convexity of e is preserved in the area where \zji\ > j3y,. 

A similar observation can be made from the curvature of p. It is not difficult to show 
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that 



ttP 



rrP 



-Gff^Diag 



FQi 



^Xefi^^Diag 



FQi, 
FQi. 



(15) 
(16) 
(17) 
(18) 



It follows that 



1 



ttP 

TTfJ TTp 



(19) 




i=ij=i 



Y k n 

^i=ij=\ 



II M 



■ + ■ 



rRe 



Of I 1 0' 

Sin^(/ly,-0y,- 



10 



(20) 



Thus, Tp > when |(^y,| > j3y; for all j = l,2,...,?i and / = 1,2, Even if is 
slightly less than jSy,- for some j and /, Tp may remain positive when the correspond- 
ing sin^(/Xy,- — dji) is sufficiently small and other terms in the summation in (20 1 are 
sufficiently large and positive. 

A typical problem encountered in optics is when k = I. When only one diffraction 
image is recorded, experience shows that local minima are common. Regions of nega- 
tive curvature separate local minima from the global solution. 

3. Iterative Algorithms based on Nonlinear Optimization 

Because the gradient and Hessian of Q and Q are relatively easy to evaluate, we 
may use standard minimization algorithms such as the steepest descent method, the 
Newton's method and the nonlinear conjugate gradient method to find the solution 
to the ptychographic reconstruction problem. We will review some of these methods 
in section |3.1| and discuss some techniques for improving the performance of these 



algorithms in the rest of this section. 
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3. 1. Basic Algorithms 

In many standard numerical optimization algorithms, we construct a sequence of ap- 
proximations to xjf by 

ym) = yW+PpW^ (21) 

where p^^'^ is a search direction along which the objective function (j2]l or ([s]) decreases, 
and j8 > is an appropriate step length. 

The simplest type of search direction is the steepest descent direction 



where o is either £ or p. When the Hessian of p or e is positive definite at Xj/^ \ the 

(I) 

Newton's direction p^^ , which is the solution of 



TTO ZJO 

Tjq Tjq 





(22) 



is also a descent direction. 

Due to the nonlinear least squares nature of the objective functions (|2]) and ([3]l, we 
may replace the true Hessian in ( [22] ) by a simpler approximation constructed from the 
Jacobian of the residual functions |z,| —b, or — bj for / = 1,2, ...,k. This technique 
yields the Gauss-Newton (GN) search directions. 

Both Newton's method and the Gauss-Newton method require solving a system of 
linear equations at each step in order to obtain a search direction. Because the dimen- 
sion of these linear systems is n xn, where n is the number of pixels in the image to 
be reconstructed, constructing the Hessian or Jacobian and solving these equations by 
matrix factorization based methods will be prohibitively expensive. Iterative methods 
that make use of matrix vector multiplications without forming the Hessian or the J 
matrix explicitly are more appropriate. However, several iterations may be required to 
reach a desired accuracy needed to produce a good search direction. Hence methods 
based on Newton or Gauss-Newton search directions tend to be more expensive. 



The Hessian in (22 1 can also be replaced by approximations constructed from 
changes in the gradient computed at each iteration. Such approximation yields Quasi- 
Newton search directions. 

Another commonly used search direction is the conjugate gradient direction defined 

by 

Peg — S ^ '-"'Peg , 

where g^^'^ is the gradient of (|2]l or ([s]) at i/A^^-' and a is often chosen to be 

Re[(^M)*(gW_^(^-i))] 
r/ = = =- 

||^(/-l)||2 

This choice of a yields what is known as the Polak-Ribiere conjugate gradient method. 



There are a variety of ways to choose an appropriate step length j8 in (21 1. They are 



often referred to as line search methods. The purpose of line search is to ensure that the 
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objective function decreases as we move from Xf/^^^ to y^^^^^ so that y^^^ will converge 
to at least a local minimizer as £ increases. Such type of convergence is often called 
global convergence. 

Another way to achieve global convergence in an iterative optimization procedure is 
to use the trust region technique to determine a search direction and step length simul- 
taneously. Under the trust region framework, we minimize the true objective function 
by minimizing a sequence of simpler "surrogate" functions that mimic the behavior of 
the true objective function within a small neighborhood of the current approximations. 
That is, in each step of this iterative optimization procedure, we solve what is called a 
trust region subproblem 

min ^(VaW +(/.), (23) 

II0II<A 

where q{Y) is the surrogate function, and the parameter A is known as a trust region 
radius that defines the region in which q{Y) approximates p{y) or ^(P) well. Such 
a radius must be chosen carefully. It may need to be adjusted iteratively based on the 
ratio of the reduction in the surrogate and the reduction in the true objective function 



achieved by the solution to ( 23 1 



A commonly used surrogate function is the second order Taylor expansion of the true 
objective function. The minimizer of this particular surrogate gives a full step Newton 
direction. However, the Newton step may not satisfy the trust region constraint, thus 
may not be the solution to ( p3] ). 

The trust region subproblem can be solved either exactly or approximately depend- 
ing on the cost of evaluating q{\l/) ^^id its derivatives. If the second order Taylor expan- 
sion is chosen as the surrogate, most methods need to solve the Newton equation 

V2^((/)). = -V^(0), 

where V^q is the Hessian of the true objective at the current iterate Y- This equation 
can be solved approximately by the (linear) conjugate gradient algorithm when is 



positive definite. When V^^ is not positive definite, (23 1 can also be solved by follow- 
ing a negative curvature direction to the boundary of the trust region. These techniques 
are used in an efficient iterative procedure for solving a large-scale trust region sub- 
problem developed by Steihaug fl2J. The method requires compute the matrix vector 
product V^qv for some vector v. This product can be approximated by a finite difference 
approximation 

for some small rj. Therefore, it is not necessary to explicitly construct the Hessian of 
the objective function in Steihaug's method. 

3.2. Weighted Objective and Precondition 

The least squares objective function in ([2]) and Q can be expressed as 

1 

P(V) = ^lli\zi\-bi,\zi\-bi), 
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and 



1 

2,=i 



respectively, where {x,y) = x*y denotes the standard Euclidean inner product. This in- 
ner product can be replaced by a weighted inner product {x,y)B = x*By, where B is a 
symmetric positive definite matrix, to accelerate the convergence of iterative methods 
used to recover the unknown image Xj/. As we will show in section [6j the choice of 
B = Diag(fti)^^ is particularly useful for accelerating the convergence of all iterative 
methods we have looked at. To maintain numerical stability and reduce noise amplifi- 
cation, it is often necessary to add a small constant to the diagonal of B to prevent it 
from becoming singular or ill-conditioned. 

Another useful technique for accelerating iterative methods for solving uncon- 
strained minimization problem is preconditioning. Instead of minimizing p(v^) or 
e{\l/), we make a change of variable and minimize p(0) and s{<p), where <p = K\{f, 
and is a preconditioner that is usually required to be Hermitian and positive definite. 
The purpose of introducing the preconditioner K is to reduce the condition number of 
the Hessian of the objective function. A highly ill-conditioned Hessian often leads to 
slow convergence of an iterative method. A well-known example is the zig-zag behav- 
ior of the steepest descent algorithm when it is applied to the Rosenbrock function. 

It follows from the chain rule and (j7]l that the gradient of p(i/A) is simply 

where z, = FQiY- 

If we take the preconditioner to be the constant term on the diagonal blocks of H^^,, 
i.e., 

k 

K = '£Q*Qi, (24) 

(=1 

which is a diagonal matrix, the gradient of p simply becomes 

k \ -1 / 



V;3(V) = \ 



v-(Ee;a)"(Ea™g(A),,) 



and the corresponding preconditioned steepest descent algorithm with a constant step 
length of 2 yields the following updating formula: 




where z^' = FQi\j/^^\ This updating formula is identical to that used in the error re- 
duction algorithm or alternate projection algorithm mentioned in the standard phase 
retrieval literature |[T3l . which is guaranteed to converge to at least a local minimizer as 
shown in section |4l 
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3.3. Line Search 

The global convergence of an unconstrained optimization algorithm depends on effec- 
tive line search strategies. Assuming that is a descent direction for p{\l/) at v^, i.e., 
Vp{yV'P < 0' we would like to seek an appropriate step length a so that 

p(i/A+a0) <p(i/a). 

One way to obtain such a step length is to minimize the scalar function ^{cc) = 
p(i/A+ a^) with respect to a. This can be done by applying the Newton's method to 
generate a sequence of a's that satisfy 

^^^^ 

and accepting an a, that satisfies 

§(a,)<ci^(0), and |^'(a,-)l < C2|^'(«,--i)|, 
for some small constants < ci,C2 < 1. In order to obtain the values of ^'{cCi) and 



£,"{(Xi) required in (25 1, we need to evaluate the directional derivative and curvature of 



p at 1^+ along the search direction <j). That is, 

^'{ai) = 2Re(0*Vp(vA + a,-0)) 

^"{ai) = Tp (!//'+«;(/), 0). 

Although these derivative calculations will incur additional computation, the cost of 
these computation can be kept at a minimal by keeping FQiip in memory as we will 
discuss at the end of this section. 

We should note that the Newton's method may not always succeed in finding an ap- 
propriate a due to the fact that ^ (a) is generally not globally convex. The convergence 
of the Newton's method will depend on the choice of the starting guess. When a good 
starting guess is chosen, we typically need to run only a few Newton iterations to reach 
a reasonable a value. Because the purpose of line search is to identify a step length that 
would lead to a sufficient reduction in the objective function, it is not necessary to find 
the actual minimizer of (a). 

However, an exact line search may not satisfy what is known as the second Wolfe 
condition 

Vp(vA + a0)>>C2Vp(vA)^(/), 

where < C2 < 1 is typically chosen to be 0.9. This condition on the change of the 
curvature of the objective function and the first Wolfe condition 

for some constant ci typically chosen to be 10^^^, which is a condition that guarantees 
a sufficient decrease in the objective function, are required to prove the global conver- 



gence of the sequence {v^^^^} generated by (21 1 in many optimization algorithms. Line 
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search techniques that satisfy both Wolfe conditions can be found in ||T4|| and many 
other standard optimization textbooks |[T5]| . We should note that these techniques may 
also be sensitive to the choice of the initial guesses to the step length as well as the 
choice of ci and C2 parameters. When a poor initial guess is chosen, these techniques 
can yield a values that are too small. Strategies for choosing a good starting guess of 
a can be found in |[T5l also. 

Regardless which line search technique one uses, one typically needs to evaluate 
the objective function £(i/A + a0) or p(i// + a0) and its directional derivatives for a 
number of different a values. If we compute \j/ = \{f+ C)C<p first and use the formulae 
given in Q, ([3]l, ([8]l and ^ to evaluate the objective function and directional derivative 
(by replacing y with \ff), each evaluation will perform k FFTs. To reduce the cost of 
line search, we may evaluate f, = FQi(j) in advance so that no FFT is required in the the 
hne search procedure itself. For example, to evaluate (|2]), we can simply compute 



Pi¥) = J^\\\zi + octi\-bi 



i=l 



where z, =FQi\i/ and tj have been computed already. Similarly, the direction derivative 
of p at a0 can be obtained from 



I Re 

(=1 



t*{zi + ati)-t*Diag 



Zi + atj 
\zi + ati\ 



Also, notice that no FFT is required in the curvature calculation ( 20 1 once 's are avail- 
able. 



4. Fixed-Point Iteration and Projection Algoritlims 

An alternative approach to finding a minimizer of (|2]) is to set its gradient to zero and 
seek Y that satisfies the first order necessary condition of the minimization problem. 
If LLi QlQi is nonsingular, by setting Vp(i/A) = 
to 0, we obtain 

¥ = fi¥) 

where 



yk 



Q*Qi¥-QlF*Dmgi^^jb 

(26) 



(=1 



-1 



Ie;/^*Diag 

(=1 



bi 



(27) 



Recall that Zi = FQiY- Clearly, i/A is a fixed point of the function /. 



A simple iterative technique one may use to find the solution to ( 27 1 is the fixed point 
iteration that has the form 

Replacing / with the right hand size of ( [27| ) yields 



k 
[ 

i=l 



Xe*F*Diag 



(^) 



(28) 
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where z^P = FQi\j/^^\ This is the same sequence of iterates produced in what is known 
as the error reduction algorithm in standard phase retrieval literature |[T3i . This method 
is also known as the alternate projection algorithm for reasons to be discussed below. 



It is easy to verify that the updating formula in ( 28 1 is identical to that produced by 
a preconditioned steepest descent algorithm in which the preconditioner K is chosen to 
he K = Ya^i Q.*iQi, and a constant step length of 2 is taken at each iteration, i.e., 



The sequence of iterates {^f^^''} produced by (28) is guaranteed to converge to the 
fixed point of f{^f) from any starting point {i//'")}, if the spectral radius (i.e., the largest 
eigenvalue) of the Jacobian of / (with respect to y/) is strictly less than 1. Because the 



function / in ( 26 1 can be viewed as a function of and y/, we should examine the 
Jacobian matrix of the system 




£!2*F*Diag 



£efF^Diag^U, 



(29) 
(30) 



where ( 30 1 is simply the conjugate of (29 1. It is not difficult to show that this Jacobian 
matrix has the form 











K — 2H^^ 

_9rjP 



(31) 



where H^^,, H^^, H^^ and H^^ are as defined in (jlsj), ( |l7| , ( jisj ) and ( 16 1 respectively. 
If (A, 0) is an eigenpair of /, we can easily showmat 



TTP 

^ if/})/ 

H- /7- - 



(1-A) 



K 
K 



If we again let tjj = \tjj\e'^J', l^ji = \l^ji\e'^J' and j5ji be the jth component of the vectors 
ti = FQi^, Zi = FQiY and bi respectively, we can easily show that 



Ll,LU^iA^ji-ejd\tji\'h/K 



lLi £7=1 ki'P 



(32) 



Clearly, when jSy,- < |(^y;| for all j = 1,2, ...,m and / = 1,2, ...n, |A| < 1, and the fixed 
point iteration is guaranteed to converge to at least a local minimizer of p. 

The fixed point of / may also be obtained by applying Newton's algorithm to seek 
the root of r{Y) = 0> where r{\i/) = y — fiv)- The Newton's method produces a se- 
quences of iterates { y^^'^ } that satisfy 
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where the J matrix here is the Jacobian of r with respect to y^f. This approach is equiv- 
alent to applying Newton's algorithm (with appropriate Une search and trust region 
strategies) to minimize p{y)- 

Successive approximations to J can be constructed from i/a^^) and r(i//^(^)) using Broy- 
den's technique. This is similar to the Quasi-Newton algorithm discussed in the previ- 
ous section. As a special case, replacing J with the crudest approximation, the identity 
matrix /, yields the standard error reduction algorithm. 



If we multiply ( 29 1 from the left by 2,- for / = 1 , 2, . . . , and let y'^^'^ = Q^^^^i , where 



Q = {Q\ Q*2 - eP*, we obtain 



y 



■■PqPf(./'^), (33) 



where Pg = e(e*e)"^!2*, and 



My)=F*^^-b, 



where F = Diag and = (Z^f Z?^ ••• ^f)^- 

Because a fixed point y of PqPf is in the range of Q, which is typically full rank 
when mk > n, we may recover the corresponding fixed point of / from y via the least 
squares solution y^^^ = {Q*Qy^Q*y^^^- 

This nonlinear map is the composition of a (linear) orthogonal projector Pq and 



a (nonlinear) Fourier magnitude projector Pp. A fixed point iteration based on (33 1 



is also called alternating projection (AP) algorithm in the phase retrieval literature 



because the approximation to the solution of (33 1 is obtained by applying Pq and Pp in 
an alternating fashion. 

It is easy to verify that Pp is indeed a projection operator in the sense that 

ll-ffl)') — 3'll < — jII for all w G {w|w = /V(h')}. (34) 

This property of Pp, together with the fact that Pq is an orthogonal projection operator, 
i.e. llPeJ — jII < — jII for all w G Range(2), allows us to show that the residual 
error \\PQPp{y^-^^) —y^-^^\\ decreases monotonically in the AP algorithm. The proof of 
this observation was shown by Fienup in |[T6l . which we summarize below. 

Let y^^) be the vector produced in the ^-th AP iterate. Clearly, j'^^ G Range(2). Be- 
cause Pq is an orthogonal projector, we have 

\\PQPp{y^''^)-Pp{y('))\\ < ||PePH/))-/^ll = 11/+') (35) 



Because Pp{y'^'-^) G {w\w = Pp{w)}, it follows from (34) that 



|p^(/+l))_3;(^+l)|| = \\Pp{PQPp(yV)))-PQP,,(yV))\\ < \\PQPj,(yW)-P^l 



(36) 



Consequently, we can deduce from p5h and (36 1 that 



||P^(/'+i))_/+i)||<||/+i)_3,(. 
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Finally, it follows from the following inequality 



and the fact that G Range(g) that 

\\y{i+2)_ym)\\^\\y{m)_yW\\^ (37) 



The equality in ( 37 1 holds only when Pf {y^^^ ) = , when convergence is reached. 



The inequality ( 37 1 shows that the AP algorithm converges to a stationary point. 



However, the convergence can be extremely slow because 

||,(^+l)|| = ||^3,(^+l)|| = ||3,(^+l)|| = < \\Pp{y('))\\ = \\b\\ 



and many of the terms pji/i^ji, i = \,2,...,k and j = 1,2, in (32i may be great 
than 1. Only when y^^^ is very close to the fixed point of PqPf, the spectral radius of the 



Jacobian of (27 1 may become much smaller than 1 in ( [32| ) due to the reduction effect 
of the sin^(/iy,- — dji) terms. 

The simple alternating projection algorithm has been extended to the hybrid input- 
output (HIO) algorithm |[T6l . the relaxed averaged alternating reflection (RAAR) algo- 
rithm IUtII . and many other variants ||T81IT3]| in the phase retrieval literature. Just to give 
a few examples, in the HIO and RAAR algorithms, the approximation to the solutions 



of ( [30| ) and ( |33| ) are updated by 

/+!) = [PqPp + {I -PQ){i-pPF)]y'^'\ mo, 

y(W) ^ [2I5PqPf + {I -2I5)Pf + 15 {Pq-I)]/'\ RAAR. 
where j8 is a constant often chosen to be between and 1 . 

Although these algorithms tend to accelerate the convergence of y^^\ their conver- 
gence behavior is less predictable and not well understood. 



5. Wigner Deconvolution 

Long before iterative methods were applied to solve the ptychography problem, Ro- 
denburg and his colleagues suggested that the problem can be solved via what they 
called Wigner deconvolution [9 |. 

To explain the basic idea behind Wigner deconvolution, we need to state a con- 
tinuum version of the ptychography problem. If the set of translation vectors {x} 
forms a continuum in 2D, then it can be shown lfT9ll that the Fourier transform of 
yl^ = |^{a(r)i//(r + x)}p with respect to x, which we denote by ^^xiyl}, can be writ- 
ten as the convolution of two functions with respect to r', i.e., 

^Aylir)} = [A(r')A(r' + x')] V mr'mr'-x% (38) 

where A(r') = ^{a{r)}, *P(r') = .^{\j/{r)}, A denotes the conjugate of A, *F denotes 
the conjugate of *F, and v denotes a convolution operation with respect to r'. Note that 
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'^xiy'xi^)} is a function of x'. The Fourier transform of A(r')A(r' + x') or *P(r')*l'(r' — 
x') is called a Wigner distribution in 191. 

The Fourier transforms used in the definition of A(r') and ^^(r') can be replaced by 
discrete Fourier transforms (DFT) if both a{r) and i/A(r) are band- limited and they are 
sampled at or beyond the Nyquist frequency. The Fourier transform of yl^ with respect 
to X can be replaced by a DFT only if the translation vector x is sampled at or beyond 
the Nyquist frequency of V'^(r). 

We will define a fully sampled *F(r') by a column vector 

/= (/1/2 ■■■fnY ■, 

where /,• = *I'(r-)- Note that, when appeared by itself in *F, the variable x' and r' can be 
used interchangeably, i.e., ft = *I'(x-) holds also. 

There are at least two ways to represent *F(r')*F(r' — x') systematically in a vector 
form. We choose to write it as 



«(/) 



/ Diag(/)Pf/ \ 
\ Diag(/)P,[// 



where Pi is a permutation matrix that shifts / cyclically by / — 1 pixels, and / denotes 
the conjugate of /. This representation corresponds to writing down *F(r')*P(r' — x') 
by having r' as the fastest changing index. By enumerating x' first, we can represent 
*P(r')*F(r' — x') in an alternative form 



nu{f) 



( fyPU \ 
fiPU 



\ fnPU ) 



(39) 



where n is an x «^ permutation matrix that reorders x' and r'. 

Employing the same ordering we use to represent the fully sampled *I'(r')*F(r' — x'), 
we can express the convolution kernel A(r')A(r' + x') by a matrix W . This matrix has 
a block diagonal form, i.e.. 



W 



V 



W2 



where is a block cyclic matrix with cyclic blocks (BCCB). This type of BCCB 
structure allow the convolution W,Diag (/) to be carried out efficiently by using 
FFTs. 
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Using the notation established above, we can now express the sampled version of 

nwuif) = P, 



(38l as 



where 



P=FU 



( b\ \ 



\ 



and F is the matrix representation of a 2D discrete Fourier transform of an image with 
n pixels. 

\fW is nonsingular, i.e., W,- is nonsingular for all / = 1,2, we can recover «(/)) 
by simply inverting W , i.e. 

u{f) = W-^U^P, (40) 

Equation ( [401 ) represents a deconvolution process, and is known as Wigner deconvolu- 
tion ||9l. The application ofW^^ to the vector DJP can be achieved through an FFT 
based fast deconvolution or an iterative solver such as the conjugate gradient algorithm. 
We do not need to explicitly invert the matrix W. If W is singular or ill-conditioned, we 
may add a small constant to the diagonal of W to regularize the deconvolution. 

Applying the permutation n to u{f) allows us to rewrite the solution of the decon- 
volution problem in the form of (|39]). If /,■ 7^ for / = 1,2, we define c, = 1//;. 
Furthermore, let us define = YW^Yl^ , which can be partitioned as 

( 8\\ 



^1 



^2 
82 



where gj S 



-<nx 1 



By treating c,- as a separate set of unknowns, with the exception of of ci, which we 



will set to an arbitrary constant, e.g., 1, we can turn (40 1 into a linear least squares 
problem by minimizing the norm of 



Pi 
Pi 



-Diag(gi) 



\ / / \ 

C2 



V 



-Diag(|2) J \cn J 



\ / 



(41) 



The minimization of ||r|| can be easily solved by back substitution. This is essentially 
the "stepping out" procedure described in |9]. The reason that we can set ci to an 
arbitrary constant is that we are often interested in the relative amplitudes and phases 
of "^{r), multiplying the entire image i//(r) or *P(r') by a constant does not change the 
quality of the image. 
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It may seem that the use of iterative method is not necessary if we can solve the 
ptychography problem by Wigner deconvolution, which can be viewed as a linear in- 
version scheme. However, as we will show below, the Wigner deconvolution problem 
cannot be solved directly (using an FFT based deconvolution scheme) if x is sampled 
below the Nyquist frequency, i.e. when the amount of probe translation is larger than 
the resolution of the image to be reconstructed. 

When X is sampled below the Nyquist frequency, which can occur in an experiment, 



we must modify (38 1 by introducing an aliasing operator S^'- Because a{r) is a lo- 



calized window in practice, A(r') is subsampled in the reciprocal space. Therefore a 



subsampling operator must be included in a finite-dimensional analog of (38 1 to 
account for this effect. 



With these additional operators, the sampled version of equation (38 1 can be ex- 
pressed as 

S^>nSr'Wu{f)=P, (42) 

where the dimensions of n, W, u{f) and Ip- need to be adjusted to reflect fewer pixel 
samples per diffraction frame and fewer frames resulting from increased distance x be- 
tween two adjacent frames. For simplicity, let us assume that / and each frame are 
square images with n and m pixels respectively, and the distance between two adjacent 
frames is dx (in either the horizontal or the vertical direction). Then, the aliasing opera- 



tor 5x' in (42 1 is a block diagonal matrix consisting of Uf diagonal blocks of dimension 
mxn, where rif = Y^fnj \fm\ . The subsampling operator 5r' is a block diagonal matrix 
consisting of m diagonal blocks of dimension rif xn, and H is an rifm x rifm row per- 
mutation matrix that reshuffles the rows of Sr'Wu{f) so that x' is the fastest changing 
index. For ID signals, a diagonal block of S^' can be represented by 

I,n ' ' ' ) ) 

where /„, is an m x m identity matrix. Similarly, a typical diagonal block of Sr' has the 
form 

(••• In, •••), 

where 7,,^ is an rif x rif identity matrix. 

Because S^', H and S^' are not square matrices, we cannot obtain «(/) by simply 
applying the inverse of these matrices and W^^ tob^. 

Instead, we must recover /, hence the fully sampled \j/{r), by solving the following 
nonhnear least squares problem 

mm\\S^,nSr'Wu{f)-Pf. (43) 

It is not difficult to see that the objective function in the nonlinear least squares prob- 
lem ( [43] ) is equivalent to ([3]). Therefore, iterative optimization techniques applied to 
minimize (|3]) can be used to solve (43 1 also. However, the evaluation of the objective 



function in ( |43| ) and its derivatives, which we will not show here, are more costly be- 
cause evaluating u{f) requires at least ff{n^) operations, and multiplying W with u{f) 
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requires an additional ff(n'^log{n)) operations. This operation count is much higher 
than that associated with evaluating ([3]), which is ^(m?ijlog(?iy ) +mnf). 

We should mention that, if one is interested a reconstruction of limited resolution, /, 
which is a cropped version of x'), the objective function in (43 1 can be modified 
to become 

\\S^U{S,WSl,)u{f)-bY, 

where u{f) G C'"^^ Furthermore, if the translation of the frame x' is chosen to be 
commensurate with the size of each frame, e.g., x' = ^Jnjm, then Syj becomes an iden- 
tity matrix. Consequently, one may obtain (and subsequently /) by performing a 
Wigner deconvolution. 



6. Numerical Examples 

In this section, we demonstrate and compare the convergence of iterative algorithms 
for ptychographic reconstruction using two test images. The first test image is a 256 x 
256 real-valued cameraman image shown in Figure |2] The image is often used in the 
image processing community to test image reconstruction and restoration algorithms. 
The second test image is a complex valued image. It also contains 256 x 256 pixels 
that correspond to the complex transmission coefficients of a collection of gold balls 
embedded in some medium. The amplitude and phase angles of these pixels are shown 
in Figure |3] 




Fig. 2. The cameraman test image. 
All numerical examples presented in this paper are performed in MATLAB. 

6. 1. Comparison of Convergence Rate 

In this section, we show the convergence behavior of different iterative algorithms we 
discussed in section[3]by numerical experiments. In the cameraman image reconstruc- 
tion experiment, we choose the illuminating probe fl'(r) to be a 64 x 64 binary probe 
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so 100 ISO 200 2S0 

(a) Amplitude 



so 100 ISO 200 2S0 

(b) Phase 



Fig. 3. The amplitude and phase of the transmission coefficient of a collection of 
gold balls. 



shown in Figure 4(a) The pixels within the 32 x 32 square at the center of the probe 



assume the value of 1. All other pixels take the value of 0. The zero padding of the inner 
32 X 32 square ensures that the diffraction pattern of a 64 x 64 frame associated with 
this probe is oversampled in the reciprocal space. In the gold ball image reconstruction 
experiment, the illuminating probe is chosen to be the amplitude of the Fourier trans- 
form of an annular ring with inner radius of ri 5.4 and outer radius of r2 ~ 19.4. This 
probe mimics the true illumination used in a physical experiment. 





(a) The binary probe used in the reconstruction 
of the cameraman image. 



(b) The probe used in the reconstruction of the 
gold ball image. 



Fig. 4. The illuminating probes fl(r) used in ptychographic reconstructions of the 
cameraman and gold ball images. 



In the cameraman experiment, the probe is translated by 8 pixels at a time in ei- 
ther horizontal or vertical direction. To prepare a stack of k diffraction images bi, 
i = l,2,...,k, we start from the upper left corner of the true image, extract a 64 x 64 
frame, and multiply it with the probe, and then apply a 2D FFT to the product. The 
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magnitude of transform is recorded and saved before we move either horizontally or 

vertically to obtain the next frame. If the lower right corner of the frame goes outside of 

the image (which does not happen in this particular case), we simply "wrap the probe 

around" the image as if the image is periodically extended. As a result, the total number 

of diffraction frames we use for each reconstruction is 

, 256 256 
k=— — = 1024. 



As we will show in section [6~4l the size of translation, which determines the amount 
of overlap between adjacent frames, has a noticeable effect on the convergence of the 
iterative reconstruction algorithms. 

Figure [5] shows the convergence history of several iterative algorithms discussed in 
section |3] when they are applied to the diffraction frames extracted from the cameraman 
image. We plot both the relative residual norm defined by 



res = ^ , (44) 

Vi?=ill^<f 

where = I^GjV^^^^I and I is the iteration number, and the relative error of the 
reconstructed image defined by 



err - 



In these runs, an exact line search is used in the steepest descent (SD), nonlinear 



• NT 




iteration number 



iteration number 



(a) Change of the relative residual norm (res) for (b) Change of the relative error (err) for the re- 



the reconstruction of the cameraman image. 



construction of the cameraman image. 



Fig. 5. A comparison of the convergence behavior of different iterative ptycho- 
graphic reconstruction algorithms for the cameraman image. 

conjugate gradient (CG). The Steihaug's trust region technique implemented in EOl is 
used in the Newton's method (NT). We set the starting guess of the solution i// to 
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It is clear from Figure [5] that NT converges mucli faster tlian tlie other algorithms. Its 
performance is followed by the CG algorithm which is much faster than the error re- 
duction (ER), SD, Gauss-Newton (GN) and the hybrid input-output (HIO) algorithms. 
Similar convergence behavior is observed when other random starting guesses are used, 
although occasionally, a random starting guess can lead to stagnation or convergence 
to a local minimizer. We will discuss this issue in section 16.31 We set the maximum 
number of iterations allowed in all runs to 30. This is somewhat excessive for both 
NT and CG algorithms. Typically, when the relative error of the reconstructed image 
falls below 10^^, it is nearly impossible to visually distinguish the reconstruction from 
the true image. When the relative error is larger, the reconstructed cameraman images 



may contain visible artifacts such as those shown in Figures 6(a) and 6(b) which are 
produced at the end of the 30th ER and SD iterations respectively. 





(a) ER reconstruction 



(b) SD reconstruction 



Fig. 6. The reconstructed cameraman images by ER and SD algorithms contain 
visible ringing artifacts. 



It is somewhat surprising that GN performs poorly on this problem. We believe the 
problem is that we used the MATLAB implementation of the large-scale Gauss-Newton 
algorithm, i.e., the function Isqnonlin in the MATLAB's Optimization Toolbox, 
which does not handle functions of complex variable very well. Moreover, it is not easy 
to obtain the relative error associated with the approximate reconstruction produced at 
each iteration from this function. 

For the reconstruction of the gold ball image, we choose the starting guess to be 



-1 



V^^°^= (IG;G,| £Gpiag(^)Diag(|«,| 



(=1 



where m, is a complex random vector, and the real and imaginary part of each compo- 
nent has a uniform distribution within [—1,1]. 

In this experiment, the probe is translated by a larger amount (16 pixels) in either 
horizontal or vertical direction. Figure |7] shows the convergence history of ER, SD, 
CG, HIO, and NT. From Figure 7(a)[ it appears that CG is the best among all the 
methods we tried. The HIO algorithm performs well in the first 60 iterations, but then 
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stagnates. As we can see from FigurejTJthat the neither the residual norm nor the relative 
error associated with HIO changes monotonically. This is not completely surprising 
because HIO does not try to minimize either objective functions. For this example, the 
performance of NT lags behind CG by a large margin although both algorithms exhibit 
monotonic convergence with a more predictable error reduction. We should mention 
that to measure the relative error associated with a reconstructed gold ball image Y^^^ > 
we need to multiply it by a constant phase factor / first, i.e., the relative error is defined 
as 



err - 



I YY^^^ ■ 





iteration number 



40 60 80 

iteration number 



(a) Change of the relative residual norm (res) for (b) Change of the relative error (err) for the re- 
the reconstruction of the gold ball image. construction of the gold ball image. 

Fig. 7. A comparison of the convergence behavior of different iterative ptycho- 
graphic reconstruction algorithms for the gold ball image. 

In Figure [8} we can clearly see that the magnitude of the reconstructed images pro- 
duced by CG (Figure 8(a)[ ) and HIO (Figure 8(c)| ) are nearly indistinguishable from 
the magnitude of the true image. However, the phase angles of the reconstructed image 



produced by CG (Figure 8(d) i appear to be better than those produced by HIO, which 
is indicated by the magnitude of the absolute errors l/y/^^) — y\ shown in Figures 8(b) 
and [8(d)] 

6.2. The Effect of Preconditioning 

As we indicated in SectionlT2) the use of a preconditioner can enhance the convergence 



of SD and CG. A natural preconditioner that is easy to construct is ( p4| ). However, this 
preconditioner is only effective, when the condition number of K is relatively large. 
For the binary probe used in the reconstruction of the cameraman image, K = 41. The 
condition number of this matrix is 1. Hence, using this preconditioner has no effect on 



the convergence of the CG iteration, as we can clearly see in Figure 9(a) The condition 
number associated with the probe used in the gold ball image reconstruction is around 



4.5. Hence the effect of the preconditioner is negligible as we can see from Figure 9(b) 
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50 100 150 200 250 



50 100 150 200 250 



(a) The magnitude of the reconstructed gold ball 
image produced by the CG algorithm. 




50 100 150 200 250 



(c) The magnitude of the reconstructed gold ball 
image produced by the HIO algorithm. 



(b) The magnitude of the error associated with 
the reconstructed gold ball image produced by 
the CG algorithm. 




50 100 150 200 250 



(d) The magnitude of the error associated with 
the reconstructed gold ball image produced by 
the HIO algorithm. 



Fig. 8. The reconstructed cameraman images produced by CG and HIO. 




iteration number iteration number 

(a) The effect of the preconditioner on the con- (b) The effect of the preconditioner on the con- 
vergence of SD. vergence of CG. 

Fig. 9. The effect of a preconditioner on the convergence of the CG algorithms 
appUed to cameraman and gold ball image reconstruction. 
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6.3. Local Minimizer and the Choice of the Objective Function 



As we indicated in section |2.2[ based on the analytic Hessian and curvature expres- 
sion, that neither e{\lf) nor p{'^f) is globally convex. This observation suggests that all 
iterative optimization algorithm discussed above may converge to a local minimizer. 
Although we found that in practice, local minimizers are not easy to find, they do exist 
as the following example show. 

In order to find a local minimizer, we construct many random starting guesses using 
the MATLAB rand function. To save time, we chose to reconstruct a 64 x 64 subimage 
of the cameraman image shown in Figure [2] This subimage is shown in Figure 12(a) A 
16x16 binary probe that has a value 1 in the 8x8 center of the probe and elsewhere is 
used. The diffraction stack consisting of 64 diffraction images is obtained by translating 
the probe 4 pixels a time in either the horizontal and vertical direction. 



Figure 10 shows that one of the random starting guesses lead to the convergence of 



the CG algorithm to a local minimizer. In particular, the relative residual ( 44 1 which is 
proport ional t o the objective function p stagnates around 0.9 after the first 15 iterations 
(Figure 10(a)i, whereas the relative gradient ||Vp(i/A(^')||/||i//|| decreases to 10^*^ after 
40 iterations. 

Figure 1 12(b) shows how the reconstructed image compares with the true image for 
this particular starting guess used. In this case, the local minimizer appears to contain 
visible artifacts in a small region near top of the tripod. The amplitude of this localized 



error is also revealed in the relative error plot shown in Figure |1 l(a)| The phase error 
associated with a particular frame of the reconstruction obtained from 



QiW Qi¥ 



\QiV\ \QiV\ 

for some particular Qj is shown in Figure [Tl(b)| 



£ 10 



20 30 
iteration number 



iio° 



20 30 
iteration number 



(a) Change of the relative residual norm (res). (b) Change of the relative gradient. 

Fig. 10. The convergence of CG to a local minimizer. 



We should also note that for this particular starting guess, all methods we tried con- 
verged to the same local minmizer. This is not all that surprising. It simply shows 
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(a) Amplitude error in the reconstruct image 



(b) Phase error in degrees associated with a par- 
ticular frame 



Fig. 11. The error associated with a local minimizer. 



iii 




z 



CI 



(a) True image. (b) The reconstructed image (a local minimizer). 

Fig. 12. The artifacts produced by a local minimizer of p. 



(empirically) that local minimizers of Q exists, and our starting guess is sufficiently 
close to it. 

However, what is interesting is that if we choose to minimize Q by using any one 
of the iterative methods discussed above from the same starting guess, we are able to 



obtain the correct solution. For examples. Figure 13(a) shows that when the NT applied 
to the weighted (scaled) objective function 



2\T 



Diag(Z70 '(|z,{ 



(45) 



' !=1 



where |z,| = \FQi\j/\ and bi = \FQi\j/\, an accurate reconstruction can be obtained in 
roughly 350 iterations. Admittedly, the convergence rate is much slower in this case 
when compared to the convergence of NT when it's applied to Q from a different 
starting point. The convergence is even slower if no weighting (or scaling) is used, 
i.e. when (|3]) is used as the objective function. However, the fact that convergence 
can be reached for (45 1 but not ([2]) from the same starting point is quite interesting. 
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Furthermore, Figure 13(b)| shows that if we take the local minimizer returned from an 
iterative minimization of (|2]) as the starting guess for minimizing (45 1, convergence can 
be reached in 12 iterations. This experiment suggests that it may be useful to have a 
hybrid optimization scheme in which Q is minimized first. If a local minimizer of ([2]) 
is identified, one can then try to minimize ( [45] ) starting from the local minimizer of (|2|). 




-•-with scaling 
■* -without scaiing 



160 200 250 
iteration number 




iteration number 



(a) The convergence of the NT algorithm when it (b) The convergence of the NT algorithm when 
is applied to l|3j (red) and l |45[ l. The starting guess the starting guess is chosen to be the local mini- 
chosen in these runs is the same one used in the mizer shown in Figure [T2(b)| 
minimization of j2j. 



Fig. 13. The convergence of the NT algorithm when applied to ([3]) (red) and ( [45 
(blue). 



6.4. The Effect of Overlapping on the Convergence of Iterative Algorithm 

As we alluded to earlier, the amount of overlap between two adjacent diffraction frames 
has a noticeable effect on the convergence of optimization based iteration algorithms 
(e.g., CG, NT, SD etc.) used to reconstruct the true image. Although we currently do 
not have a clear way to quantify such an effect mathematically, the following examples 
demonstrate this effect. 

In the first example, we try to reconstruct the gold ball image from four different 
diffraction stacks. Each stack contains a set of 64 x 64 diffraction frames. These frames 



are generated by translating the probe shown in Figure 4(b) by different amount in 
horizontal and vertical directions. The larger the translation, the smaller the overlap is 



between two adjacent images. Figure 15(a) shows that CG converges very slowly when 
the diffraction stack contains diffraction frames obtained by translating the probe 20 
pixels at a time (the black curve). Faster convergence is observed when the amount of 
translation is decreased to Ax = 16, 12, 8. It is interesting to see from Figure 15(b) that 
the amount of overlap does not affect the convergence of the HIO algorithm. 

In the second example, we try to reconstruct the gold ball image from 1024 diffrac- 
tion frames of 128 x 128 pixels. The illumination function is similar to that used in 
Figure[4| It is scaled by a factor of 2 to 128 x 128 pixels. The probe FWHM (full width 
at half maximum) is 30 pixels. We choose to fix the number of frames. So the recon- 
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structed area increases with step size. When probe is near the edge of the image, we 
"wrap it around the edge" as if the image itself is periodically extended. The overlap 
is varied by changing the step size Ajc. The larger the Ajc, the smaller the amount of 
overlap. 

The starting point is produced from a random number generator for each test. A 
range of step sizes between 6 and 30 pixels have been tried. For a fixed step size, the 
test is repeated 100 times. We observe that the step size Ajc does not influence the con- 
vergence rate up to Ax ~ 20. Figures [T4(a)| and |14(c)|show that the conjugate gradient 



method converges in less than 400 iterations, while the RAAR algorithm requires al- 



most 1500 iterations. Figures 14(b) and 14(d) illustrate the percentage of successful 
runs started from a random guess for each of the step sizes < Ax < 30. The per- 
centage of successful runs (shown in color) is plotted against the maximum number of 
allowed iterations. When Ax < 20, both CG and RAAR converge nearly 100% of the 
time when a relatively small number of iterations are used in these methods. However, 
when 20 < Ax < 25, more iterations are required to ensure the convergence of CG and 
RAAR. When 25 < Ax < 30, CG appears to stagnate for all random starting guesses 
we tried, whereas RAAR can still converge when a very large number of iterations are 
taken. 

To explain the effect of overlapping on the convergence of optimization based iter- 
ative algorithms such as the nonlinear CG, we examine the structure of the Hessian of 
the objective function p in It follows from ( 15 l-( 16 1 that the HP can be written as 



HP = {{FQ)* {FQY 



Bn 
B21 



fill 
B22 



FQ 
FQ 



(46) 



where Bi 1 = B22 and B12 = B21 diagonal, F is a block diagonal matrix of discrete 

Fourier transforms, i.e. F = Diag(F,F, ...,F), and Q = {Q\ Q\ ... Ql)*. The diagonal 
elements of Bn and B12 are simply 1 — Pjj/{2i^jj) and I5jii^jj/(2\i^ji\^) respectively for 
/ = \ ,2,...,k and 7 = 1,2, ...,m. 

We will show that HP is diagonal dominant when there is a sufficient amount of 
overlap between adjacent diffraction frames. To simplify our discussion, let us assume 
for the moment that bj is a ID diffraction pattern obtained from a binary probe that 
illuminates three pixels at a time, and the probe is translated one pixel at a time so that 
the image frame that produces bi overlaps with that produces bi-\ by two pixels. In this 
case, the FQ term in (46 1 has the form 
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where /; is the ith column of F. 
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iterations 



(a) convergence of CG from 100 random starts (b) percentage tests that converge to err < 10 

Ajc = 20 





(c) convergence of RAAR from 100 random 
starts Ax: = 20 



(d) percentage of RAAR iterations that converge 
to an error of 10~* 



Fig. 14. The convergence rate of the CG and RAAR methods from different ran- 
dom starting points. 
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As a result, a typical diagonal term of has the form 

hP. = f*Di^2fi + f*Di^Ji + f*Difi = tmct{Di^2 + Di^, +Dt), (47) 

where D,- is a diagonal matrix that contains elements 1 — Pji/ (l^ji) for 7 = 1 , 2, 3. 

When Y is near the solution, z, is close to Z?,. Hence, D; is likely to contain positive 
entries only. Therefore, the diagonal elements of HP are likely to be much larger com- 
pared to the nonzero off-diagonal elements which contain terms in the form of either 
fJDiff and its conjugate, where j ^ £, or fjEiff and its conjugate, where is a di- 
agonal matrix (and part of B12) that contains elements j8j,(^?/(2|^y;p) for j = 1,2,3. 
Due to the phase difference between fj and /<?, D,'s do not add up "coherently" on 
the off-diagonal of HP as they do on the diagonal. Neither do nonzero entries in Efs 
add up coherently on the off-diagonal blocks of HP either. Hence, the matrix HP be- 
comes diagonal dominant when there is larger amount of overlap between two adjacent 
frames. In fact, the diagonal of HP may become so dominant that the spectral prop- 
erty of HP is determined largely by the diagonal part of the matrix, which is typically 
well conditioned due to the averaging of D, in ( [47] ). This observation provides an in- 
tuitive explaination on why increasing the amount of overlap between adjacent frames 
tends to improve the convergence rate of CG and other optimization based iterative 
ptychographical phase retrieval algorithms. Although this is not a precise analysis of 
the spectral property of HP, the analysis does match with observations made in our 
numerical expriments. Moreover, this type of analysis can be extended to the 2D case 
in which F is represented as a tensor product of two ID discrete Fourier transforms. 
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(a) The effect of overlapping on the convergence (b) The effect of overlapping on the convergence 
of CG for the gold ball image reconstruction. of HIO for the gold ball image reconstruction. 



Fig. 15. The effect of overlapping on the convergence of CG and HIO algorithms. 



7. Conclusion 

We formulated the ptychographic phase retrieval problem as a nonlinear optimization 
problem and discussed how standard iterative optimization algorithms can be applied 
to solve this problem. 
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We showed that the optimization problems we solve are not globally convex. Hence 
standard optimization algorithms can produce local minimizers. However, the Hessian 
of the objective functions we minimize do have special structures that may be exploited. 

We compared the performance of several optimization algorithms and found that 
Newton's method with Steihaug's trust region technique gave the best performance on 
a real valued image. For a complex valued image, the nonhnear conjugate gradient 
algorithm appears to perform better. 

We discussed the effect of preconditioning on convergence of the CG algorithm. 
We also demonstrated it is possible for an optimization algorithm to converge to a local 
minimizer although in practice such type of convergence failure is rare, especially when 
the amount of overlap between two adjacent diffraction frames is large. 

We demonstrated by a numerical example that the convergence rate of an optimiza- 
tion algorithm depends on the amount of overlapping between two adjacent diffraction 
frames. We provided an intuitive analysis on why this occurs. More research is needed 
to provide a more precise analysis on this phenonmenon. 

We identified the connection between the optimization based approach with both 
Wigner deconvolution and projection algorithms often used in phase retrieval litera- 
tures. We pointed out the hmitation of Wigner deconvolution and showed that the op- 
timization based algorithm tend to perform better than projection algorithms such as 
HIO when the amount of overlap between adjacent images is sufficiently large. 
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